Subgroups of adult-onset diabetes: a data-driven cluster analysis in a Ghanaian population

Adult-onset diabetes mellitus (here: aDM) is not a uniform disease entity. In European populations, five diabetes subgroups have been identified by cluster analysis using simple clinical variables; these may elucidate diabetes aetiology and disease prognosis. We aimed at reproducing these subgroups among Ghanaians with aDM, and establishing their importance for diabetic complications in different health system contexts. We used data of 541 Ghanaians with aDM (age: 25–70 years; male sex: 44%) from the multi-center, cross-sectional Research on Obesity and Diabetes among African Migrants (RODAM) Study. Adult-onset DM was defined as fasting plasma glucose (FPG) ≥ 7.0 mmol/L, documented use of glucose-lowering medication or self-reported diabetes, and age of onset ≥ 18 years. We derived subgroups by cluster analysis using (i) a previously published set of variables: age at diabetes onset, HbA1c, body mass index, HOMA-beta, HOMA-IR, positivity of glutamic acid decarboxylase autoantibodies (GAD65Ab), and (ii) Ghana-specific variables: age at onset, waist circumference, FPG, and fasting insulin. For each subgroup, we calculated the clinical, treatment-related and morphometric characteristics, and the proportions of objectively measured and self-reported diabetic complications. We reproduced the five subgroups: cluster 1 (obesity-related, 73%) and cluster 5 (insulin-resistant, 5%) with no dominant diabetic complication patterns; cluster 2 (age-related, 10%) characterized by the highest proportions of coronary artery disease (CAD, 18%) and stroke (13%); cluster 3 (autoimmune-related, 5%) showing the highest proportions of kidney dysfunction (40%) and peripheral artery disease (PAD, 14%); and cluster 4 (insulin-deficient, 7%) characterized by the highest proportion of retinopathy (14%). The second approach yielded four subgroups: obesity- and age-related (68%) characterized by the highest proportion of CAD (9%); body fat-related and insulin-resistant (18%) showing the highest proportions of PAD (6%) and stroke (5%); malnutrition-related (8%) exhibiting the lowest mean waist circumference and the highest proportion of retinopathy (20%); and ketosis-prone (6%) with the highest proportion of kidney dysfunction (30%) and urinary ketones (6%). With the same set of clinical variables, the previously published aDM subgroups can largely be reproduced by cluster analysis in this Ghanaian population. This method may generate in-depth understanding of the aetiology and prognosis of aDM, particularly when choosing variables that are clinically relevant for the target population.


Methods
Study design and population. The Research on Obesity and Diabetes among African Migrants (RODAM) study is a multi-center cross-sectional study that was conducted between July 2012 and September 2015. It involved Ghanaian adults (25-70 years) living in rural and urban Ghana (Ashanti region), and three European cities (Amsterdam, London, and Berlin) (N = 6385) 3 , with a crude prevalence of aDM of 9% 18 . The primary objective of the RODAM study was to disentangle the relative contributions of (epi)genetic and non-genetic risk factors for obesity and diabetes mellitus. The study protocol and the procedures of the RODAM study have previously been published 19 . Medical history, lifestyle and socio-economic factors were recorded either by ethnically matched personnel in questionnaire-based interviews or were self-administered. Specifically, we included questions about the age of disease onset, previous diagnosis of diabetes mellitus, the start of medication prescription, and the type of glucose-lowering medications.
Biomaterial collection and laboratory measurements. Adult-onset diabetes mellitus was defined according to WHO guidelines as fasting plasma glucose ≥ 7.0 mmol/L, documented use of glucose-lowering medication or self-reported diabetes. We excluded individuals with age of disease onset before 18 years. Fasting venous blood samples and urine samples were collected by trained personnel according to standard operating procedures; blood samples were centrifuged. Urine, serum and plasma samples were ultimately stored at − 80 °C. All biochemical analyses were performed using an ABX Pentra 400 chemistry analyser (ABX Pentra; Horiba ABX, Germany).
The biomarker profile of this study comprised HbA1c, serum insulin, blood lipids (total cholesterol, triglyceride, high-density lipoprotein cholesterol, low-density lipoprotein cholesterol), inflammatory markers (C-reactive protein), and liver enzymes (aspartate aminotransferase (AST), alanine aminotransferase (ALT), and γ-glutamyl transferase (GGT)). We calculated the Homeostatic Model Assessment for insulin resistance (HOMA-IR) and for beta-cell capacity (HOMA-beta) according to Matthews et al. 20 ; using fasting plasma glucose and fasting insulin in the assessments. Serum creatinine concentration was determined by a kinetic colorimetric spectrophotometric isotope dilution mass spectrometry calibration method (Roche Diagnostics). Estimated glomerular filtration rate (eGFR) was calculated using the 2009 CKD-EPI (CKD Epidemiology Collaboration) creatinine equation and the severity of kidney disease categorized according to the 2012 KDIGO guidelines 21 . Urinary albumin concentration (in mg/L) was measured by an immunochemical turbidimetric method (Roche Diagnostics). Urinary creatinine www.nature.com/scientificreports/ concentration (in μmol/L) was measured by a kinetic spectrophotometric method (Roche Diagnostics). Urinary albumin-creatinine ratio (ACR; expressed in mg/g) was calculated by taking the ratio between urinary albumin and urinary creatinine and stratified according to the 2012 KDIGO classifications: A1, < 3 mg/mmol (normal to mildly increased); A2, 3 to 30 mg/mmol (moderately increased); and A3, > 30 mg/mmol (severely increased). Antibodies against glutamic acid decarboxylase (GAD65Ab) were determined by radioligand binding assay (RBA) using established cut-offs (Ghana: > 121 U/mL; Europe: > 97 U/mL), as previously described by Hampe et al. 22 .
Physical examinations. Trained study personnel performed the physical examinations. Anthropometric measurements were taken in light clothes and without shoes, including weight (kg) by a person scale, height (cm) by a stadiometer, and waist circumference (cm) using a measuring tape (all devices SECA, Germany). We calculated Body Mass Index (BMI) as weight over squared height (kg/m 2 ). Blood pressure (BP) was measured three times using a validated semiautomated device (MicrolifeWatch BP home, Widnau, Switzerland), with appropriately sized cuffs after at least 5 min rest while seated. The mean of the last two BP measurements was used for the analyses. Hypertension was defined as systolic BP ≥ 140 mmHg and/or diastolic BP ≥ 90 mmHg, and/or being on antihypertensive medication treatment. Ankle brachial index (ABI), the ratio of the resting systolic blood pressure at the ankle to the resting systolic brachial pressure at the arm, was obtained from two blood pressure measurements on the left side (leg and arm) and two on the right side (leg and arm) using the Microlife Watch BP Office ABI with appropriately sized cuffs, after at least 10 min of supine rest. The cuffs for measuring the ankle and brachial pressures were placed just above the ankle and at the upper arm, respectively.
Definitions of diabetic complications. Validated methods were used to detect kidney dysfunction, peripheral artery disease (PAD) and coronary artery disease (CAD). Albuminuria was defined as urinary albumin ≥ 3 mg/mmol, and impaired eGFR was based on CKD-EPI eGFR criteria, comprising G3 to G5. Nephropathy was defined as albuminuria or microalbuminuria (stages 2-4) according to the report from Joint Committee on Diabetic Nephropathy 23 . PAD was defined as ankle brachial index (ABI) < 0.9 24 . CAD was assessed using the WHO Rose angina questionnaire 25 . We used self-reported data for retinopathy and stroke. Retinopathy was assessed by a positive response to the question 'Have you ever been told by a doctor that you have eye disease or eye damage as a result of diabetes?' . Stroke was assessed by a positive reply to the question 'Have you ever had a stroke?' . Statistical analysis. All analyses were conducted using the software SAS, version 9.3 (SAS Institute, Cary, North-Carolina, USA).
Missing data handling. Among the 541 participants with aDM, there were 140 individuals with missing or implausible values (< 2nd and ≥ 98th percentile) in any of the variables of interest. These variables comprised socio-economic data (n = 140), anthropometric measures (n = 52), biochemical data (n = 40), and lifestyle information (n = 24). To avoid selection bias and to improve statistical power, we applied multiple imputation using the discriminant fully conditional specification (FCS) method and derived 10 imputed datasets. FCS is also known as multiple imputation by chained equations (MICE) and uses separate conditional univariate imputation models specified for each incomplete variable, with other variables as predictors. Multiple imputation was applied under the assumption that the propensity of missing data can be explained by the observed data (missing at random, MAR). This assumption was supported by the good imputation efficiency (98.2-99.9%).
Cluster analysis. Two approaches were applied to derive aDM subgroups using exploratory cluster analysis.
First, we re-constructed the clusters by Ahlqvist et al. 9 using the same six clinical and anthropometric variables (first approach). These have been chosen because they are readily available in routine clinical practice: age at onset of diabetes mellitus, body mass index (BMI), HbA1c, HOMA-beta, HOMA-IR, and the presence of glutamate decarboxylase antibodies (GAD65Ab). We centered all variables to a mean value of 0 and a standard deviation (SD) of 1. Then, a two-step cluster analysis was applied, in which the first step extracted a predefined number of clusters on the basis of silhouette width (n = 5). This step used log-likelihood as a distance measure and Schwarz's Bayesian criterion for clustering. The second step involved k-means clustering to confirm the stability of clusters. In the k-means clustering, GAD65Ab presence was not included, because this analysis can only accommodate continuous variables. Cluster labels were assigned according to previously published subgroups by Ahlqvist et al. 9 .
Second, we repeated the cluster analysis using variables that have established clinical relevance for aDM among Ghanaian populations (second approach). More specifically, we have previously seen that anthropometric measures of abdominal obesity have the best discriminative ability for diabetes mellitus among Ghanaian adults 26,27 , and thus, waist circumference was preferred over BMI. Also, we included FPG as a readily available biomarker in resource-poor settings. Calculations of HOMA-beta and HOMA-IR based on C-peptide do not reflect the important role of serum insulin in aDM. While C-peptide only describes pancreatic insulin release, fasting insulin also reflects impaired hepatic insulin clearance, which likely contributes to insulin resistance 28 . Therefore, we preferred insulin values over C-peptide-based calculations of HOMA. In the second approach, we did not use GAD65Ab, because our previous work in the RODAM Study suggests that GAD65Ab may poorly discriminate for diabetes mellitus 22 and may not be readily available in sub-Saharan Africa. Age at onset of aDM was also included in this second approach. The same two-step cluster analysis was applied to the data as described above, with the modification that 2 to 5 cluster solutions were generated in step 1. The optimal number of clusters www.nature.com/scientificreports/ to be extracted was identified based on cluster size (> 5% of the population) and the dendrogram of explained variance (R 2 > 0.25). Cluster labels were assigned by examining the phenotypic characteristics. Two sensitivity analyses were performed. First, we repeated the cluster analysis after exclusion of individuals on insulin treatment to assess the stability of the subgroups. Second, we derived clusters stratified by study location (Ghana and Europe).

Characterization of clusters and proportions of diabetic complications.
For each cluster description, we calculated means and SD for normally distributed variables, medians and interquartile ranges (IQR) for skewed variables, and proportions for categorical variables. The proportions of diabetic complications were calculated for each cluster and were compared by χ 2 -test.
Ethics statement. The RODAM Study was conducted according to the guidelines laid down in the 1964 Declaration of Helsinki and its later amendments. All procedures involving human subjects were reviewed and approved by the respective ethics committees in Ghana (Committee on Human Research, Publication and Ethics, Kwame Nkrumah University of Science and Technology, Kumasi), the Netherlands (Medical Ethics Review Committee, Academic Medical Centre, University of Amsterdam), the UK (Observational/ Interventions Research Ethics Committee, London School of Hygiene and Tropical Medicine), and Germany (Ethics Commission, Charité-Universitaetsmedizin Berlin). Written informed consent was obtained from all participants.
Role of the funding source. The funders of the study had no role in study design, data collection, data analysis, data interpretation, or writing of the manuscript and in the decision to submit the paper for publication. Table S1 shows the characteristics of the total study population (N = 5898), stratified by diabetes status. In brief, participants with aDM (n = 541) had a mean age of 53.2 years (SD: 9.5 years), and 56% were women. Most individuals with aDM lived in Amsterdam (32%), followed by urban Ghana (25%), London (19%), Berlin (13%) and rural Ghana (10%). Two-thirds had none or only elementary formal education and had manual occupations; 14% were smokers. The median duration of aDM was 5 years (IQR: 1-11 years), and many participants with aDM had a family history of diabetes (41%). The mean HbA1c was 7.7% (SD: 2.2%). The proportion of individuals with GAD65Ab-positivity was similar between participants with diabetes mellitus and individuals without the disease (5% versus 6%). More than two-thirds of individuals with aDM were overweight or obese (BMI ≥ 25.0 kg/m 2 ) and 56% had abdominal obesity (waist circumference > 102 cm for men and > 88 cm for women). Figure 1 displays the proportions of aDM subgroups derived by cluster analyses using two sets of clinical and anthropometric variables. Figure 1A shows the results for the first approach, based on age of disease onset, BMI, HbA1c, HOMA-beta, HOMA-IR, and the presence of GAD65Ab; Fig. 1B depicts the identified subgroups from the second approach, based on age of disease onset, waist circumference, FPG, and insulin. The distributions of cluster characteristics are shown in Fig. 2, and the detailed characteristics of the identified subgroups are presented in Table 1.

Subgroups and their characteristics.
In the first approach, we reproduced the five subgroups reported by Ahlqvist et al. 9 : cluster 1 (obesity-related, 73%), cluster 2 (age-related, 10%), cluster 3 (autoimmune-related, 5%), cluster 4 (insulin-deficient, 7%), and cluster 5 (insulin-resistant, 5%). The obesity-related subgroup was characterized by late age of disease onset. The age-related subgroup also showed late age of disease onset and high frequency of female cases. The autoimmunerelated subgroup (GAD65Ab-positive) had a younger age at disease onset and the least frequent family history of diabetes. The insulin-deficient subgroup was characterized by early age of disease onset, and the lowest serum insulin levels. For the insulin-resistant subgroup, the majority of cases were women with the highest serum insulin levels, HOMA-IR and HOMA-beta. The stability of subgroups according to k-means clustering was robust,  www.nature.com/scientificreports/ except for the discrimination between autoimmune and insulin-deficient subgroups. These were extracted as one subgroup by k-means analysis (Table S2).
In the second approach, we identified four subgroups: cluster 1 (obesity-and age-related, 68%), cluster 2 (malnutrition-related, 8%), cluster 3 (body fat-related and insulin-resistant, 18%), and cluster 4 (ketosisprone, 6%). Despite the fact that we omitted GAD65Ab from this cluster analysis, 23 out of the 29 individuals with GAD65Ab clustered in the ketosis-prone subgroup. The malnutrition-related subgroup had the lowest serum insulin concentration and HOMA-beta, and showed the lowest BMI, waist circumference and fat mass. The body fat-related and insulin-resistant subgroup was characterized by female preponderance, high fasting glucose concentration, and high serum insulin levels. This was also seen for the ketosis-prone subgroup, who also showed high BMI and waist circumference; 6% of them had urinary ketone bodies. Again, the stability of subgroups was confirmed through k-means clustering, with some overlap between the obesity-and age-related subgroup and the body fat-related and insulin-resistant subgroup (Table S2).
Overlaps between the two approaches are presented in Supplementary Figure S1. Most individuals in the obesity-related subgroup of the first approach remained in the obesity-and age-related subgroup of the second approach. Individuals who were clustered in the age-related subgroup (first approach) transitioned to the body fat-related and insulin-resistant subgroup (second approach). Also, most participants in the insulin-deficient subgroup (first approach) were relocated to the malnutrition-related subgroup (second approach).

Medication and diabetic complications by subgroups.
We describe the distributions of medications across subgroups in Table 1 and present the proportions of diabetic complications by subgroup in Fig. 3. The majority of study participants received-with overlap-oral glucose-lowering medication (79%), followed by lifestyle modifications (55%), and insulin injections (15%). Almost half of the participants with aDM received insulin treatment immediately after diagnosis (48%), and one-third during the first 6 months after diagnosis (Table 1). Regarding diabetic complications, microvascular conditions (retinopathy, nephropathy, albuminuria) were more common than macrovascular complications (Coronary Artery Disease (CAD), Peripheral Artery Disease (PAD), stroke; Fig. 3). Kidney disease was most prominent with about one-third of the participants who fulfilled the criteria for nephropathy. This was followed by self-reported retinopathy amounting to 14% in the total study population. Macrovascular complications were seen in 7% for estimated CAD, 4% for PAD, and 3% for self-reported stroke.
Among the subgroups derived from the first approach, the following diabetic complications were observed (Fig. 3A,C). The age-related subgroup showed the highest proportions of estimated CAD (18%) and self-reported stroke (13%). Liver enzymes were increased in this subgroup (Table 1). In the autoimmune-related subgroup, markers of kidney dysfunction and PAD were more prominent than in other subgroups. The insulin-deficient subgroup showed the highest proportions of retinopathy (24%). Yet, neither the proportions of complications nor the distributions of diabetes medication were significantly different between the subgroups.  www.nature.com/scientificreports/   www.nature.com/scientificreports/ The four subgroups derived from the second approach were characterized as follows (Fig. 3B,D). The ketosisprone subgroup showed preponderance of albuminuria and nephropathy. The malnutrition-related subgroup had the highest proportion of retinopathy. Yet again, none of the between-group differences were statistically significant. Regarding medications, almost two-thirds of the body fat-related and insulin-resistant subgroup received diet and physical activity, while this applied to less than half of the ketosis-prone subgroup. Both, insulin and glucose-lowering medication, were most frequently prescribed in the ketosis-prone subgroup (Table 1).

Sensitivity analyses.
We performed two sensitivity analyses. First, we repeated the second cluster analysis approach to account for the influence of insulin treatment on serum insulin concentration. For this analysis, 87 out of 541 individuals who were on insulin medication were excluded. Supplementary Table S3 shows the corresponding subgroups. Again, the largest cluster was the obesity-and age-related subgroup (40%), followed by malnutrition-related diabetes (39%), body fat-related and insulin-resistant diabetes (18%), and ketosis-prone diabetes (3%).
Also, we derived aDM subgroups using the Ghana-specific set of variables and stratified by location (Ghana versus Europe) to account for contextual differences. The proportions of subgroups in the two locations are shown in Supplementary Figure S2, and their characteristics are presented in Supplementary Table S4. The subgroups derived per location were similar in their characteristics but differed in proportions. Malnutritionrelated diabetes was more common in Ghana (14%) than in Europe (1%), while this was the opposite for the obesity-and age-related subgroup (Ghana: 29%; Europe: 47%). The proportions of ketosis-prone diabetes and the body fat-related and insulin-resistant subgroup were similar between Ghana and Europe (Table S4). Finally, the distributions of complications across the site-specific subgroups are presented in Supplementary Figure S3. In Ghana and Europe, kidney dysfunction was most prevalent in the subgroups characterized by insulin-resistance. Macrovascular complications were rather common in ketosis-prone diabetes. The occurrence of retinopathy did not dominate in any subgroup.

Discussion
Summary of main results. Here, we reproduced, for the first time among sub-Saharan African adults, subgroups of aDM using data-driven cluster analysis with six simple clinical variables. We extracted the same five clusters as in the original study 9 : obesity-related (73%), age-related (10%), autoimmune-related (5%), insulin-deficient (7%), and insulin-resistant (5%). In comparison to European and Asian populations 10 , there were marginal differences in the characteristics of subgroups, their proportions, and the distributions of diabetic complications across clusters. In a second approach, we derived subgroups by employing variables with established clinical relevance for Ghanaian populations. This approach yielded four subgroups that had only some overlap  www.nature.com/scientificreports/ with the five initial clusters regarding clinical, treatment-related and complication profiles: obesity-and agerelated (68%), malnutrition-related (18%), body fat-related and insulin-resistant (8%), and ketosis-prone (6%).
Reproducibility of subgroups. The first approach generated similar clusters compared to those initially discovered in Scandinavian populations 9 . However, the RODAM Study population showed differences in the occurrence and some characteristics of these clusters. A recent systematic review summarizes data of 14 studies that have employed unsupervised learning methods to derive subgroups of adult-onset diabetes in populations from Asia, Europe, the USA, and Australia 10 . In five of them, the same five subgroups were extracted. While the obesity-related subgroup is also highly prevalent in previous studies (20-34%), this cluster amounted to 73% among the present Ghanaian sample. Further, the age-related subgroup constitutes the most common cluster in non-Ghanaian populations (34-45%). Yet, only 10% of individuals with aDM in the RODAM Study exhibited an age-related phenotype. Another difference is discernible for the insulin-resistant subgroup. This yielded 7-24% in previous studies but only 5% in the present population. These findings might be explained by the poor discriminative ability of BMI for diabetes mellitus among Ghanaians 26,27 and the suggested large proportion of metabolically healthy obesity among African populations 29 . In fact, all subgroups in this study, except for the age-related cluster in the first approach, showed high BMI values and low HOMA-IR (Table 1). Further, the proportions of the insulin-deficient (7%) and the autoimmune subgroups (5%) in our study ranked at the lower end of the previously reported occurrences (3-22%). The subgroup of autoimmune diabetes mellitus might include individuals with LADA or type 1 diabetes mellitus. Yet, the concentrations of GAD65Ab and the proportions of autoantibody-positive individuals are similar between participants with and without aDM 22 . Hence, this biomarker may not discriminate well for diabetes status in this population, and the meaning of GADA positivity in the latter subgroup remains unclear. Further, the small proportion of the insulin-deficient subgroup may result from the presence of ketosis-prone diabetes, possibly hidden within the age-related cluster ( Figure S1). A considerable proportion of individuals with aDM from sub-Saharan Africa have ketosis-prone diabetes, characterized by urinary ketone bodies and reserved beta-cell function after stabilization of blood glucose through initial insulin treatment 30 .
In fact, the second approach appears to separate these so-called atypical forms of diabetes mellitus from the conventional types. Many features of the ketosis-prone subgroup (6%) from the second approach accord with previously published features of this atypical phenotype: Age at onset was in the fourth decade, family history of diabetes was common, BMI, and body fat percentage were high, 6% of these patients presented with urinary ketones, half of the participants received insulin immediately at diagnosis, and beta-cell reserve was moderate (Table 1). In addition, the malnutrition-related sub-group (8%) exhibited characteristics of the previously described protein-deficient pancreatic diabetes and fibro-calculous pancreatic diabetes 30 : Age at onset was rather young, family history of diabetes was less prominent than in other clusters, BMI and body fat percentage were low, and beta-cell function was impaired. Notably, the malnutrition-related subgroup was more common in Ghana, and corroborates our hypothesis that environmental triggers may outweigh genetic predisposition in aDM 22 . Further, the obesity-and age-related subgroup presented as the largest cluster (68%). This subgroup resembled the obesity-related subgroup identified in the first approach, particularly mean BMI and mean HbA1c were high, while insulin resistance was moderate. The cluster characteristics largely overlapped with the body fat-related and insulin-resistant subgroup (18%), except for the high values of mean HOMA-IR in the latter. Therefore, the body fat-related and insulin-resistant subgroup is similar in proportion and features to SIRD derived in other populations 10 .
Therapeutic strategies for glycaemic control and prevention of diabetic complications. Recently, Tanabe et al. 30 have suggested therapeutic strategies for the observed subgroups along a continuum of insulin demand and secretory capacity for the prevention of diabetic complications. For insulin-deficient subgroups, including the autoimmune-related and the insulin-deficient clusters, insulin therapy appears to be the treatment of choice, possibly in combination with oral insulin secretagogues, to stabilize long-term blood glucose (HbA1c) and to prevent the development of retinopathy-the most common diabetic complication in these subgroups 12 . This is corroborated by our findings in the RODAM Study for both analysis approaches. Yet, we also noticed some discrepancies between our results and the previously reported treatment and complication profiles of those clusters characterized by insulin deficiency. Among Ghanaians, kidney dysfunction and PAD were also common in those subgroups characterized by high proportions of GAD65Ab; they were mainly prescribed oral glucose-lowering medication. This difference may stem from our cross-sectional study design and the inclusion of prevalent aDM cases with existing diabetic complications. Indeed, disease duration affects the cluster allocation, as individuals potentially transition from one subgroup to another over time 13 . This remains to be investigated when follow-up data of the RODAM study population will be available. For subgroups characterized by older age of disease onset, metformin and dipeptidyl peptidase-4 inhibitors (DPP4i) have been proposed to prevent the observed risks of coronary events 31 . In the RODAM Study population, similar treatment profiles and diabetic complications (stroke, CAD, PAD) were discernible in the age-related subgroup (first approach) and in the obesity-and age-related subgroup (second approach). Yet, lifestyle modifications were also frequently prescribed for the obesity-and age-related subgroup derived in the second approach, possibly due to the large proportion of individuals with insulin resistance and obesity. In other studies, this subgroup emerged as the obesity-related cluster with high BMI and frequent occurrence of liver dysfunction. For diseases management, literature suggests to emphasize on weight control with diet and physical exercise plus metformin, and other oral glucose-lowering medication 31 . This strategy is promoted even stronger for insulin-resistant subgroups, and comprise a combination of weight-loss through lifestyle and surgery, as well as insulin-sensitizing drugs for the prevention of microvascular complications, mainly kidney dysfunction 31  www.nature.com/scientificreports/ confirm these treatment and complication profiles, whereas the body fat-related and insulin-resistant subgroup from the second approach was dominated by liver dysfunction and cardiovascular events.

Strengths and limitations.
This analysis uniquely adds to understanding the aetiology of aDM, and thereby, contributes to improved clinician counseling and pharmacological interventions. We provide evidence that subgroups of diabetes mellitus may be reproduced in sub-Saharan African populations, when using the previously published clinical variables and methodological approach. Clearly, our results need to be interpreted with caution, as we studied prevalent aDM with different disease durations. These factors might have influenced cluster allocation, proportions, and complication profiles. As a research prospect, we aim at verifying our results in the follow-up data, 5 years after baseline. Here, we have applied hierarchical clustering followed by k-means clustering, which are more suitable when aiming to separate subgroups. Other colleagues have used soft-clustering as an alternative 32 , which allows participants to represent in more than one subgroup-an approach that might have less clinical relevance but could facilitate the study of disease aetiology. Also, future work might expand to classify various states of dysglycemia, including impaired glucose tolerance and prediabetes. Reassuringly, the clusters in our study remained stable when individuals on insulin treatment were excluded from the analysis (Ghana-specific approach). Additional variables can aid the derivation of aDM subgroups, including metabolomics and genomics 9,10 . Yet, we refrained from these approaches as they are unlikely to be incorporated into clinical practice. Also, we acknowledge that HOMA-beta and HOMA-IR were based on fasting insulin, not on fasting C-peptide, which contrasts the methodology by Ahlqvist and colleagues. For HOMA-IR, insulin-based values may overestimate insulin resistance among individuals on insulin medication. Still, sensitivity analysis excluding participants on insulin treatment (16%) produced similar subgroups. For HOMA-beta, insulin-based calculations might underestimate beta-cell function due to hepatic insulin storages, but they reflect impaired hepatic insulin clearance, which likely contributes to insulin resistance 28 . In this crosssectional study, we cannot comment on the stability of these clusters, as individuals can migrate over time and clinical features overlap between the identified subgroups. The RODAM Study population has been thoroughly phenotyped regarding biomarkers of glucose metabolism, anthropometric measurements, and several diabetic complications. Yet, we acknowledge that retinopathy, stroke and CAD were assessed by self-report only. Just as a large proportion of individuals with aDM from sub-Saharan Africa remains undiagnosed 3 , so do their diabetic complications. Therefore, self-reported complications might have underestimated the true proportions in our study population, particularly for microvascular complications, and thus, contributed to type II error. Similarly, the various oral glucose-lowering medications were not documented in this study; this limits the interpretation of treatment-related characteristics across subgroups.

Conclusion
In this Ghanaian study population, the first approach (replication from Ahlqvist et al. 9 ) provided subgroups that separated well between diabetic complications, but differentiated less sharp between phenotypes. This was the opposite for the second approach (Ghana-specific variables): It produced well-known diabetic phenotypes among African populations, which however, did not separate well for diabetic complications. Pending verification in prospective studies with incident aDM among sub-Saharan African populations, our findings suggest that cluster analysis based on routinely collected clinical data can contribute to the understanding of diabetes aetiology and possibly disease progression.

Data availability
The study protocol and statistical analysis plans were previously published 19 . Individual participant data will be shared with researchers in a deidentified or anonymized format upon submitting a research proposal and requesting data access to Prof. Charles Agyemang, UMC Amsterdam (c.o.agyemang@amsterdamumc.nl). Data will be made available for analyses as approved by the data access committee.